Recent progress in randomized motion planners has led to the development of anew class of sampling-based algorithms that provide asymptotic optimalityguarantees, notably the RRT* and the PRM* algorithms. Careful analysis revealsthat the so-called "rewiring" step in these algorithms can be interpreted as alocal policy iteration (PI) step (i.e., a local policy evaluation step followedby a local policy improvement step) so that asymptotically, as the number ofsamples tend to infinity, both algorithms converge to the optimal path almostsurely (with probability 1). Policy iteration, along with value iteration (VI)are common methods for solving dynamic programming (DP) problems. Based on thisobservation, recently, the RRT$^{\#}$ algorithm has been proposed, whichperforms, during each iteration, Bellman updates (aka "backups") on thosevertices of the graph that have the potential of being part of the optimal path(i.e., the "promising" vertices). The RRT$^{\#}$ algorithm thus utilizesdynamic programming ideas and implements them incrementally on randomlygenerated graphs to obtain high quality solutions. In this work, and based onthis key insight, we explore a different class of dynamic programmingalgorithms for solving shortest-path problems on random graphs generated byiterative sampling methods. These class of algorithms utilize policy iterationinstead of value iteration, and thus are better suited for massiveparallelization. Contrary to the RRT* algorithm, the policy improvement duringthe rewiring step is not performed only locally but rather on a set of verticesthat are classified as "promising" during the current iteration. This tends tospeed-up the whole process. The resulting algorithm, aptly named PolicyIteration-RRT$^{\#}$ (PI-RRT$^{\#}$) is the first of a new class of DP-inspiredalgorithms for randomized motion planning that utilize PI methods.
展开▼